Programming environment

Load raw data

Data preprocessing

Data cleaning

Data partition

## Figure 1. GA (week) distribution for each partition and independent test. GA, gestational week.

Numerical data transformation

## Figure 2. QQ plot of numerical variables. QQ, quantile-to-quantile.
## Numerical variables with normal distribution by QQ plots: maternal_age, bbw, supplemental_o2_duration, delta_bw, delta_hc, hospital_stay, m24_cognitive_composite_score, m24_language_composite_score, m24_motor_composite_score, m24_cong_ss, m24_rc_ss, m24_ec_ss, m24_fm_ss, m24_gm_ss, m_chat_r, m_chat_f
Table 1. Normality test.
variable statistic p.value alternative
imv_duration* 0.319 <0.001 two-sided
ga* 0.140 <0.001 two-sided
Note:
*, p-value <=0.05, Shapiro-Wilk normality test.
## Numerical variables that are not normally distributed: ga, imv_duration
## Choices for a transformation technique: log, sqrt, inv, log2, exp, asinh, bct
Table 2. Normality test after transformation.
variable best_trans p.value log sqrt inv log2 exp asinh bct
imv_duration* sqrt <0.001 NA <0.001 NA NA <0.001 <0.001 NA
ga* bct <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001
Note:
*, p-value <=0.05, Shapiro-Wilk normality test.
## Numerical variables that are not normal after transformation: ga, imv_duration

Outlier analysis

## Relevant numerical var after transformation:

Correlation matrix

## Categorical variables with a category of 1 value: ms_delta_hc, ms_imv_duration, ms_supplemental_o2_duration
Table 3. Categorical variables with pair-wise perfect separation.
V1 V1_value V2 V2_value n
mother_education_below_university 1-yes family_social_risk_index 0-no family social risk 0
low_family_ses 1-yes family_social_risk_index 0-no family social risk 0

## Figure 3. Correlation matrix. *, based on p-value (frequentist) or 95% CI (Bayesian); †, pair of variable and its missingness; ‡, at least 1 variable was a categorical variable with a category of 1 value; §, insufficient sample size.

Missing value imputation

Descriptive statistics

Table 4. Sample characteristics.
Variable ASD (-) ASD (+) p-value
Total (prevalence) n (%) 245 (81.4) 56 (18.6)
24 m/o Composite Scores of cognitive score average (SD) 96.71 (10.13) 88.39 (11.41) <0.001
24 m/o Composite Scores of language score average (SD) 93.73 (10.08) 80.88 (11.3) <0.001
24 m/o Composite Scores of motor score average (SD) 95.54 (9.98) 88.64 (11.19) <0.001
24 m/o scaled scores of cognitive score average (SD) 9.34 (2.03) 7.68 (2.28) <0.001
24 m/o scaled scores of receptive language score average (SD) 8.94 (1.73) 7 (2) <0.001
24 m/o scaled scores of expressive language score average (SD) 8.88 (2.15) 6.38 (2.18) <0.001
24 m/o scaled scores of fine motor score average (SD) 9.87 (2.09) 8.21 (2.09) <0.001
24 m/o scaled scores of gross motor score average (SD) 8.6 (1.9) 7.98 (2.09) >0.05
M chat R score score average (SD) 0.48 (0.89) 1.89 (2.67) <0.001
M chat F score score average (SD) 0.5 (0.89) 1.88 (2.66) <0.001
Maternal age year average (SD) 32.69 (4.6) 32.07 (5.71) >0.05
Mother education, below university no % (n) 61 (150) 52 (29) >0.05
yes % (n) 39 (95) 48 (27) >0.05
Low family socioeconomic status no % (n) 73 (179) 75 (42) >0.05
yes % (n) 27 (66) 25 (14) >0.05
Family Social Risk no family social risk % (n) 56 (136) 48 (27) >0.05
one family social risk % (n) 21 (52) 25 (14) >0.05
≥2 family social risk % (n) 23 (57) 27 (15) >0.05
Gestational age at birth week average (SD) 27.89 (2.08) 27.34 (2.52) >0.05
Male no % (n) 51 (126) 29 (16) 0.007
yes % (n) 49 (119) 71 (40) 0.007
Birth body weight g average (SD) 1080.99 (250.72) 1044.95 (326.62) >0.05
Small for gestational age no % (n) 95 (233) 88 (49) >0.05
yes % (n) 5 (12) 12 (7) >0.05
Requiring surfactant no % (n) 60 (148) 46 (26) >0.05
yes % (n) 40 (97) 54 (30) >0.05
Supplemental oxygen duration day average (SD) 52.5 (30.74) 67.79 (47.23) 0.049
IMV duration day average (SD) 5.75 (13.1) 8.62 (14.27) >0.05
hs-PDA requiring ligation no % (n) 91 (222) 88 (49) >0.05
yes % (n) 9 (23) 12 (7) >0.05
Sepsis no % (n) 81 (199) 71 (40) >0.05
yes % (n) 19 (46) 29 (16) >0.05
Severe brain injury no % (n) 89 (218) 89 (50) >0.05
yes % (n) 11 (27) 11 (6) >0.05
Severe NEC no % (n) 92 (225) 88 (49) >0.05
yes % (n) 8 (20) 12 (7) >0.05
Surgery of the abdomen no % (n) 95 (232) 93 (52) >0.05
yes % (n) 5 (13) 7 (4) >0.05
BPD no % (n) 71 (174) 70 (39) >0.05
yes % (n) 29 (71) 30 (17) >0.05
Severe ROP no % (n) 89 (217) 88 (49) >0.05
yes % (n) 11 (28) 12 (7) >0.05
Delta body weight z score delta Z-score average (SD) -1.36 (0.98) -1.39 (1.04) >0.05
Delta head circumference z score delta Z-score average (SD) -1.14 (1.18) -1.26 (1.28) >0.05
Hospital stay day average (SD) 67.96 (30.57) 79.98 (41.38) >0.05
Note:
ASD, autism spectrum disorder; BPD, bronchopulmonary dysplasia; hs-PDA, hemodinamically-significant patent ductus arteriosus; IMV, intermittent mandatory ventilation; m/o, month oldNEC, necrotizing enterocolitis; ROP, retinopathy of prematurity; SD, standard deviation.

Feature extraction

Univariate regression analysis

Table 5. Univariate regression analysis.
Variable OR LB UB p-value
24 m/o Composite Scores of cognitive* score 0.932 0.906 0.959 <0.001
24 m/o Composite Scores of language* score 0.9 0.873 0.929 <0.001
24 m/o Composite Scores of motor* score 0.943 0.917 0.97 <0.001
24 m/o scaled scores of cognitive* score 0.704 0.609 0.813 <0.001
24 m/o scaled scores of receptive language* score 0.571 0.477 0.682 <0.001
24 m/o scaled scores of expressive language* score 0.619 0.535 0.717 <0.001
24 m/o scaled scores of fine motor* score 0.681 0.581 0.797 <0.001
M chat R score* score 1.797 1.396 2.312 <0.001
M chat F score* score 1.771 1.377 2.278 <0.001
Male* yes vs. no 2.647 1.408 4.978 0.003
Supplemental oxygen duration* day 1.012 1.004 1.019 0.004
Hospital stay* day 1.01 1.002 1.018 0.017
24 m/o scaled scores of gross motor* score 0.864 0.755 0.989 0.034
Small for gestational age* yes vs. no 2.774 1.039 7.403 0.042
Requiring surfactant yes vs. no 1.761 0.982 3.158 0.058
Gestational age at birth week 0.893 0.784 1.017 0.089
Sepsis yes vs. no 1.73 0.892 3.357 0.105
IMV duration day 1.014 0.995 1.033 0.158
Mother education, below university yes vs. no 1.47 0.82 2.635 0.196
Severe NEC yes vs. no 1.607 0.644 4.011 0.309
Birth body weight g 0.999 0.998 1.001 0.361
Maternal age year 0.974 0.918 1.034 0.39
Family Social Risk one vs. no family social risk 1.356 0.66 2.787 0.407
≥2 vs. no family social risk 1.326 0.656 2.677 0.432
hs-PDA requiring ligation yes vs. no 1.379 0.56 3.394 0.485
Delta head circumference z score delta Z-score 0.92 0.722 1.174 0.504
Surgery of the abdomen yes vs. no 1.373 0.43 4.38 0.593
Low family socioeconomic status yes vs. no 0.904 0.464 1.762 0.767
Delta body weight z score delta Z-score 0.964 0.72 1.291 0.807
Severe ROP yes vs. no 1.107 0.457 2.681 0.822
BPD yes vs. no 1.068 0.567 2.012 0.838
Severe brain injury yes vs. no 0.969 0.38 2.472 0.947
Note:
*, either LB > 1 or UB < 1 and p-value <= 0.05.BPD, bronchopulmonary dysplasia; hs-PDA, hemodinamically-significant patent ductus arteriosus; IMV, intermittent mandatory ventilation; m/o, month oldLB, lower bound; NEC, necrotizing enterocolitis; OR, odds ratio; ROP, retinopathy of prematurity; UB, upper bound.

Covariate selection

Multivariate regression analysis

Table 6. Multivariate regression analysis.
Variable OR LB UB p-value Covariates
24 m/o Composite Scores of language* score 0.9 0.873 0.929 <0.001
24 m/o scaled scores of expressive language* score 0.619 0.535 0.717 <0.001
24 m/o scaled scores of receptive language* score 0.75 0.598 0.939 0.012 24 m/o scaled scores of expressive language
24 m/o scaled scores of fine motor* score 0.828 0.698 0.983 0.031 24 m/o scaled scores of expressive language
24 m/o scaled scores of cognitive* score 0.704 0.609 0.813 <0.001
24 m/o Composite Scores of cognitive* score 0.932 0.906 0.959 <0.001
M chat R score* score 1.797 1.396 2.312 <0.001
M chat F score* score 1.771 1.377 2.278 <0.001
24 m/o Composite Scores of motor* score 0.943 0.917 0.97 <0.001
Male* yes vs. no 2.647 1.408 4.978 0.003
Supplemental oxygen duration* day 1.012 1.004 1.019 0.004
Hospital stay* day 1.01 1.002 1.018 0.017
24 m/o scaled scores of gross motor* score 0.864 0.755 0.989 0.034
Small for gestational age* yes vs. no 2.774 1.039 7.403 0.042
Note:
*, either LB > 1 or UB < 1 and p-value <= 0.05.

## Figure 4. Effect size changes before and after adjustment for confounders.

Feature selection

Predictive modeling

Model evaluation

## Figure 5. Sample size estimation at step of neonatal risk. Blue line indicates the LOESS. Red line indicates the modified exponential decay fitting. p(a) is the p-value of a where a is a parameter represents the factor by which the exponential term was scaled in the model and a higher value generally means a quicker initial increase. p(k) is the p-value of k where k is the rate of decay or growth in the exponential term and a positive k implies a decay (as x increases, the impact of increasing y diminishes) while a negative k suggests an erroneous model fit in this context because an increase in x should not lead to a decrease in y. AUC-ROC, the area under curve of receiver operating characteristics; LOESS, locally estimated scatterplot smoothing.

## Figure 6. Sample size estimation at step of BSID-III. Blue line indicates the LOESS. Red line indicates the modified exponential decay fitting. p(a) is the p-value of a where a is a parameter represents the factor by which the exponential term was scaled in the model and a higher value generally means a quicker initial increase. p(k) is the p-value of k where k is the rate of decay or growth in the exponential term and a positive k implies a decay (as x increases, the impact of increasing y diminishes) while a negative k suggests an erroneous model fit in this context because an increase in x should not lead to a decrease in y. AUC-ROC, the area under curve of receiver operating characteristics; LOESS, locally estimated scatterplot smoothing.

## Figure 7. Sample size estimation at step of M-CHAT-R. Blue line indicates the LOESS. Red line indicates the modified exponential decay fitting. p(a) is the p-value of a where a is a parameter represents the factor by which the exponential term was scaled in the model and a higher value generally means a quicker initial increase. p(k) is the p-value of k where k is the rate of decay or growth in the exponential term and a positive k implies a decay (as x increases, the impact of increasing y diminishes) while a negative k suggests an erroneous model fit in this context because an increase in x should not lead to a decrease in y. AUC-ROC, the area under curve of receiver operating characteristics; LOESS, locally estimated scatterplot smoothing.

## Figure 8. Sample size estimation at step of M-CHAT-F. Blue line indicates the LOESS. Red line indicates the modified exponential decay fitting. p(a) is the p-value of a where a is a parameter represents the factor by which the exponential term was scaled in the model and a higher value generally means a quicker initial increase. p(k) is the p-value of k where k is the rate of decay or growth in the exponential term and a positive k implies a decay (as x increases, the impact of increasing y diminishes) while a negative k suggests an erroneous model fit in this context because an increase in x should not lead to a decrease in y. AUC-ROC, the area under curve of receiver operating characteristics; LOESS, locally estimated scatterplot smoothing.

## Figure 9. Sample size estimation at step of M-CHAT-R/F. Blue line indicates the LOESS. Red line indicates the modified exponential decay fitting. p(a) is the p-value of a where a is a parameter represents the factor by which the exponential term was scaled in the model and a higher value generally means a quicker initial increase. p(k) is the p-value of k where k is the rate of decay or growth in the exponential term and a positive k implies a decay (as x increases, the impact of increasing y diminishes) while a negative k suggests an erroneous model fit in this context because an increase in x should not lead to a decrease in y. AUC-ROC, the area under curve of receiver operating characteristics; LOESS, locally estimated scatterplot smoothing.

## Figure 10. Sample size estimation at step of M-CHAT-RF. Blue line indicates the LOESS. Red line indicates the modified exponential decay fitting. p(a) is the p-value of a where a is a parameter represents the factor by which the exponential term was scaled in the model and a higher value generally means a quicker initial increase. p(k) is the p-value of k where k is the rate of decay or growth in the exponential term and a positive k implies a decay (as x increases, the impact of increasing y diminishes) while a negative k suggests an erroneous model fit in this context because an increase in x should not lead to a decrease in y. AUC-ROC, the area under curve of receiver operating characteristics; LOESS, locally estimated scatterplot smoothing.
Table 7. Cost of outcomes.
Step How much is the cost of detected/treated disease, e.g, severe risk of death or severe morbidity despite detection, plus the cost of intervening? How much is the cost of an unneeded intervention, e.g, invasiveness of testing, complication risk of treatment? How much is the cost of an undetected disease, e.g, severe risk of death or severe morbidity? How much is the cost of applying the risk model?
Universal N 30 20 80 2
Universal NB 30 20 80 5
Universal NBR 30 20 80 9
Universal NBF 30 20 80 12
Universal NBR/F 30 20 80 11
Universal NBRF 30 20 80 16
Universal B 30 20 80 3
Note:
BSID-III, Bayley scales of infant development version III; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-R/F, if M-CHAT-R ≥3 and M-CHAT-F ≥2, then 1, otherwise 0; N, neonatal risk factors; NB, neonatal risk factors and BSID-III; NBR, neonatal risk factors, BSID-III, and M-CHAT-R; NBF, neonatal risk factors, BSID-III, and M-CHAT-F; NBR/F, neonatal risk factors, BSID-III, and M-CHAT-R/F; NBRF, neonatal risk factors, BSID-III, M-CHAT-R, and M-CHAT-F.

## Figure 11. ASD prediction model calibration plots using neonatal risk based on validation set: (A) RR; (B) NB; (C) KNN; (D) RF; and (E) DNN. Vertical dashed lines indicate the cost-aware threshold. ASD, autism spectrum disorder; DNN, deep neural network; KNN, k-nearest neighbor; NB, naive Bayes; RF, random forest; RR, ridge regression.

## Figure 12. ASD prediction model calibration plots using BSID-III based on validation set: (A) DT; and (B) RF. Vertical dashed lines indicate the cost-aware threshold. ASD, autism spectrum disorder; BSID-III, Bayley scales of infant development version III; DT, decision tree; RF, random forest.

## Figure 13. ASD prediction model calibration plots using M-CHAT-R based on validation set: (A) DT; (B) RF; and (C) GBM. Vertical dashed lines indicate the cost-aware threshold. ASD, autism spectrum disorder; M-CHAT-R, modified checklist for autism in toddlers, revised; DT, decision tree; GBM, gradient boosting machine; RF, random forest.

## Figure 14. ASD prediction model calibration plots using M-CHAT-F based on validation set: (A) DT; (B) RF. Vertical dashed lines indicate the cost-aware threshold. ASD, autism spectrum disorder; M-CHAT-F, modified checklist for autism in toddlers, follow-up; DT, decision tree; RF, random forest.

## Figure 15. ASD prediction model calibration plots using M-CHAT-R/F based on validation set: (A) DNN; (B) DT; (C) RF. Vertical dashed lines indicate the cost-aware threshold. ASD, autism spectrum disorder; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-R/F, if M-CHAT-R ≥3 and M-CHAT-F ≥2, then 1, otherwise 0; DNN, deep neural network; DT, decision tree; RF, random forest.

## Figure 16. ASD prediction model calibration plots using M-CHAT-RF based on validation set: (A) DT; (B) RF. Vertical dashed lines indicate the cost-aware threshold. ASD, autism spectrum disorder; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-RF, both M-CHAT-R and M-CHAT-F were used; DNN, deep neural network; DT, decision tree; RF, random forest.

## Figure 17. SHAP beeswarm plot of best prediction model using neonatal risk factors: (A) RR; and (B) KNN. Interval estimate of SHAP values are shown for higher feature values (>= average). CI, confidence interval; OR, odds ratio; SHAP, Shapley additive explanation.

## Figure 18. SHAP beeswarm plot of best prediction model using BSID-III: (A) DT; and (B) RF. Interval estimate of SHAP values are shown for higher feature values (>= average). BSID-III, Bayley scales of infant development version III; CI, confidence interval; OR, odds ratio; SHAP, Shapley additive explanation.

## Figure 19. SHAP beeswarm plot of best prediction model using M-CHAT-R: (A) RF; and (B) GBM. Interval estimate of SHAP values are shown for higher feature values (>= average). CI, confidence interval; M-CHAT-R, modified checklist for autism in toddlers, revised; OR, odds ratio; SHAP, Shapley additive explanation.

## Figure 20. SHAP beeswarm plot of best prediction model using M-CHAT-F: (A) RF. Interval estimate of SHAP values are shown for higher feature values (>= average). CI, confidence interval; M-CHAT-F, modified checklist for autism in toddlers, follow-up; OR, odds ratio; SHAP, Shapley additive explanation.

## Figure 21. SHAP beeswarm plot of best prediction model using M-CHAT-R/F: (A) DT; (B) RF; and (C) DNN. Interval estimate of SHAP values are shown for higher feature values (>= average). CI, confidence interval; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-R/F, if M-CHAT-R ≥3 and M-CHAT-F ≥2, then 1, otherwise 0; DNN, deep neural network; OR, odds ratio; SHAP, Shapley additive explanation.

## Figure 22. SHAP beeswarm plot of best prediction model using M-CHAT-RF: (A) RF. Interval estimate of SHAP values are shown for higher feature values (>= average). CI, confidence interval; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-RF, both M-CHAT-R and M-CHAT-F were used; DNN, deep neural network; OR, odds ratio; SHAP, Shapley additive explanation.

## Figure 23. SHAP beeswarm plot of best prediction model using only BSID-III: (A) DT; (B) RF; and (C) GBM. Interval estimate of SHAP values are shown for higher feature values (>= average). CI, confidence interval; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-RF, both M-CHAT-R and M-CHAT-F were used; DNN, deep neural network; OR, odds ratio; SHAP, Shapley additive explanation.

## Figure 24. Feature impact alignment with multivariate regression analysis. *, aligned according to positive-to-negative ratio; †, aligned according to clinician assessment; ‡, unclear alignment according to clinician assessment; Bayley scales of infant development version III; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-R/F, if M-CHAT-R ≥3 and M-CHAT-F ≥2, then 1, otherwise 0; M-CHAT-RF, both M-CHAT-R and M-CHAT-F were used; N, neonatal risk factors; NB, neonatal risk factors and BSID-III; NBR, neonatal risk factors, BSID-III, and M-CHAT-R; NBF, neonatal risk factors, BSID-III, and M-CHAT-F; NBR/F, neonatal risk factors, BSID-III, and M-CHAT-R/F; NBRF, neonatal risk factors, BSID-III, and M-CHAT-RF.
## Best model: KNN

## Figure 25. ASD prediction model decision (A) and ROC (B) curves using neonatal risk based on validation set. Performances using the selected threshold are shown by dashed line in (A) and points in (B). ASD, autism spectrum disorder; DNN, deep neural network; KNN, k-nearest neighbor; NB, naive Bayes; RF, random forest; ROC, receiver operating characteristics; RR, ridge regression.
## Best model: GBM

## Figure 26. ASD prediction model decision (A) and ROC (B) curves using M-CHAT-R based on validation set. Performances using the selected threshold are shown by dashed line in (A) and points in (B). ASD, autism spectrum disorder; M-CHAT-R, modified checklist for autism in toddlers, revised; DT, decision tree; GBM, gradient boosting machine; RF, random forest; ROC, receiver operating characteristics.
## Best model: GBM

## Figure 27. ASD prediction model decision (A) and ROC (B) curves using only BSID-III based on validation set. Performances using the selected threshold are shown by dashed line in (A) and points in (B). ASD, autism spectrum disorder; BSID-III, Bayley scales of infant development version III; GBM, gradient boosting machine; ROC, receiver operating characteristics.
Table 8. Validation results (95% CI).
Step Algorithm AUC-ROC F1 TPR PPV TNR NPV
Universal N KNN 0.682 (0.655, 0.709) 0.35 (0.32, 0.38) 58 (52, 64) 25 (23, 28) 68 (66, 71) 90 (88, 91)
Universal NBR GBM 0.749 (0.726, 0.771) 0.37 (0.33, 0.41) 49 (44, 53) 31 (27, 35) 79 (77, 81) 89 (88, 90)
Universal B GBM 0.637 (0.604, 0.67) 0.32 (0.28, 0.36) 39 (34, 44) 29 (25, 33) 81 (79, 83) 87 (86, 89)
Note:
AUC-ROC, area under curve of receiver operating characteristics; BSID-III, Bayley scales of infant development version III; CI, confidence interval; F1, 2 x precision x recall / (precision + recall); FNR, false negative rate or specificity; FPR, false positive rate or sensitivity/recall; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; N, neonatal risk factors; NBR, neonatal risk factors, BSID-III, and M-CHAT-R; NPV, negative predictive value or precision; PPV, positive predictive value or precision; TNR, true negative rate or specificity; TPR, true positive rate or sensitivity/recall.

## Figure 28. ASD prediction step comparison. AUC-ROC, area under curve of receiver operating characteristics; ASD, autism spectrum disorder; BSID-III, Bayley scales of infant development version III; CI, confidence interval; F1, 2 x precision x recall / (precision + recall); FNR, false negative rate or specificity; FPR, false positive rate or sensitivity/recall; M-CHAT-R, modified checklist for autism in toddlers, revised; N, neonatal risk factors; NBR, neonatal risk factors, BSID-III, and M-CHAT-R; NPV, negative predictive value or precision; PPV, positive predictive value or precision; TNR, true negative rate or specificity; TPR, true positive rate or sensitivity/recall.

Predictive modeling 2

Model evaluation 2

Table 9. Validation 2 results (95% CI).
Step Algorithm F1 TPR PPV TNR NPV
N0-B Mixed 0.34 (0.31, 0.37) 75 (70, 79) 22 (20, 24) 50 (48, 52) 91 (90, 93)
N1-B Mixed 0.35 (0.31, 0.4) 34 (29, 39) 40 (34, 45) 90 (89, 92) 88 (86, 89)
N0-B0-NBR Mixed 0.34 (0.31, 0.37) 75 (70, 79) 22 (20, 24) 50 (48, 52) 91 (90, 93)
N1-B0-NBR Mixed 0.34 (0.29, 0.38) 34 (29, 39) 36 (31, 41) 89 (87, 90) 87 (86, 89)
N0-B1-NBR Mixed 0.35 (0.32, 0.38) 66 (61, 71) 24 (22, 26) 61 (58, 63) 90 (89, 92)
N1-B1-NBR Mixed 0.34 (0.28, 0.39) 25 (19, 30) 56 (47, 65) 97 (96, 97) 87 (86, 88)
Universal R None 0.29 (0.24, 0.33) 15 (11, 20) 48 (36, 60) 97 (96, 98) 86 (84, 87)
Note:
BSID-III, Bayley scales of infant development version III; CI, confidence interval; F, M-CHAT-F; F1, 2 x precision x recall / (precision + recall); FNR, false negative rate or specificity; FPR, false positive rate or sensitivity/recall; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-R/F, if M-CHAT-R ≥3 and M-CHAT-F ≥2, then 1, otherwise 0M-CHAT-RF, both M-CHAT-R and M-CHAT-F were used; N, neonatal risk factors; NB, neonatal risk factors and BSID-III; NBR, neonatal risk factors, BSID-III, and M-CHAT-R; NBF, neonatal risk factors, BSID-III, and M-CHAT-F; NBR/F, neonatal risk factors, BSID-III, and M-CHAT-R/F; NBRF, neonatal risk factors, BSID-III, and M-CHAT-RF; NPV, negative predictive value or precision; PPV, positive predictive value or precision; R, M-CHAT-R; R/F, M-CHAT-R/F; TNR, true negative rate or specificity; TPR, true positive rate or sensitivity/recall.

## Figure 29. ASD combined prediction comparison. ASD, autism spectrum disorder; CI, confidence interval; Avg., average; BSID-III, Bayley scales of infant development version III; CI, confidence interval; F1, 2 x precision x recall / (precision + recall); FNR, false negative rate or specificity; FPR, false positive rate or sensitivity/recall; M-CHAT-R, modified checklist for autism in toddlers, revised; N, neonatal risk factors; NBR, neonatal risk factors, BSID-III, and M-CHAT-R; NPV, negative predictive value or precision; PPV, positive predictive value or precision; TNR, true negative rate or specificity; TPR, true positive rate or sensitivity/recall.
Table 10. Test results (95% CI).
Step Algorithm F1 TPR PPV TNR NPV
Universal N KNN 0.19 (0.16, 0.22) 34 (30, 37) 14 (11, 16) 54 (52, 55) 79 (78, 81)
Universal NBR GBM 0.32 (0.29, 0.35) 43 (39, 47) 26 (23, 29) 75 (73, 76) 86 (85, 88)
Universal B GBM 0.19 (0.16, 0.22) 19 (15, 23) 16 (12, 19) 79 (77, 81) 82 (81, 84)
N0-B Mixed 0.26 (0.24, 0.29) 60 (57, 63) 17 (15, 19) 38 (36, 39) 82 (80, 83)
N0-B0-NBR Mixed 0.26 (0.24, 0.29) 60 (57, 63) 17 (15, 19) 38 (36, 39) 82 (80, 83)
Universal R None 0.42 (0.38, 0.47) 39 (34, 43) 49 (43, 54) 92 (90, 93) 88 (86, 89)
Note:
BSID-III, Bayley scales of infant development version III; CI, confidence interval; F, M-CHAT-F; F1, 2 x precision x recall / (precision + recall); FNR, false negative rate or specificity; FPR, false positive rate or sensitivity/recall; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-R/F, if M-CHAT-R ≥3 and M-CHAT-F ≥2, then 1, otherwise 0M-CHAT-RF, both M-CHAT-R and M-CHAT-F were used; N, neonatal risk factors; NB, neonatal risk factors and BSID-III; NBR, neonatal risk factors, BSID-III, and M-CHAT-R; NBF, neonatal risk factors, BSID-III, and M-CHAT-F; NBR/F, neonatal risk factors, BSID-III, and M-CHAT-R/F; NBRF, neonatal risk factors, BSID-III, and M-CHAT-RF; NPV, negative predictive value or precision; PPV, positive predictive value or precision; R, M-CHAT-R; R/F, M-CHAT-R/F; TNR, true negative rate or specificity; TPR, true positive rate or sensitivity/recall.

Model deployment

Table 11. Prevalence for observed and predicted outcome in all sets (N = 470).
Variable Step ASD (+) n (%) ASD (-) n (%)
Observed outcome N/A 84 (18) 386 (82)
Universal N N/A 171 (36) 299 (64)
Universal NBR N/A 103 (22) 367 (78)
Universal B N/A 169 (36) 301 (64)
N0-B0-NBR Universal N 171 (36) 0 (0)
NBR 3 (1) 208 (44)
B 88 (19) 0 (0)
N0-B Universal N 171 (36) 0 (0)
B 88 (19) 211 (45)
Universal R N/A 35 (7) 435 (93)
Note:
ASD, autism spectrum disorder; BSID-III, Bayley scales of infant development version III; CI, confidence interval; F, M-CHAT-F; F1, 2 x precision x recall / (precision + recall); FNR, false negative rate or specificity; FPR, false positive rate or sensitivity/recall; M-CHAT-F, modified checklist for autism in toddlers, follow-up; M-CHAT-R, modified checklist for autism in toddlers, revised; M-CHAT-R/F, if M-CHAT-R ≥3 and M-CHAT-F ≥2, then 1, otherwise 0M-CHAT-RF, both M-CHAT-R and M-CHAT-F were used; N, neonatal risk factors; N/A, not applicable; NB, neonatal risk factors and BSID-III; NBR, neonatal risk factors, BSID-III, and M-CHAT-R; NBF, neonatal risk factors, BSID-III, and M-CHAT-F; NBR/F, neonatal risk factors, BSID-III, and M-CHAT-R/F; NBRF, neonatal risk factors, BSID-III, and M-CHAT-RF; NPV, negative predictive value or precision; PPV, positive predictive value or precision; R, M-CHAT-R; R/F, M-CHAT-R/F; TNR, true negative rate or specificity; TPR, true positive rate or sensitivity/recall.

## Figure 30. Workflow with prevalence: (A) Universal N; (B) Universal NBR; (C) Universal R; and (D) N0-NBR. BSID-III, Bayley scales of infant development version III; F, M-CHAT-R, modified checklist for autism in toddlers, revised; N, neonatal risk factors; NBR, neonatal risk factors, BSID-III, and M-CHAT-R; R, M-CHAT-R.